GLMM_BINOMIAL
Overview
The GLMM_BINOMIAL function fits a Generalized Linear Mixed Model (GLMM) with a binomial family for analyzing binary outcomes in clustered or hierarchical data. GLMMs extend generalized linear models by incorporating random effects to account for correlation among observations within the same group or cluster, making them essential for multi-level analyses such as patients within hospitals, students within schools, or repeated measurements within subjects.
This implementation uses the BinomialBayesMixedGLM class from the statsmodels library, which employs variational Bayes estimation via the fit_vb() method. Unlike traditional maximum likelihood approaches, variational Bayes approximates the posterior distribution by finding a tractable distribution that minimizes the Kullback-Leibler divergence from the true posterior. This method provides both point estimates (posterior means) and uncertainty assessments (posterior standard deviations) for all model parameters. For theoretical background on variational inference, see Blei, Kucukelbir, and McAuliffe (2017).
The model uses a logit link function to relate the linear predictor to the probability of success:
\text{logit}(p_i) = \log\left(\frac{p_i}{1 - p_i}\right) = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{u}
where \mathbf{X}_i\boldsymbol{\beta} represents fixed effects and \mathbf{Z}_i\mathbf{u} represents random effects with \mathbf{u} \sim N(0, \boldsymbol{\Sigma}). The function returns fixed effects coefficients with associated z-statistics, p-values, confidence intervals, and odds ratios, along with random effects variance components.
The function supports random intercept models by default (accounting for baseline differences between groups) and can accommodate random slopes when a random effects design matrix is provided. Random effect standard deviation parameters use log-normal prior distributions, and fixed effects use Gaussian priors. For additional information on GLMMs, see the UCLA Statistical Consulting introduction and the statsmodels GLMM documentation.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=GLMM_BINOMIAL(y, x, groups, x_random, fit_intercept, alpha)
y(list[list], required): Binary dependent variable as a column vector (0 or 1).x(list[list], required): Fixed effects predictors matrix where each column is a predictor.groups(list[list], required): Group or cluster membership as a column vector (integer coded).x_random(list[list], optional, default: null): Random effects predictors matrix. If omitted, uses random intercept only.fit_intercept(bool, optional, default: true): If true, includes fixed intercept in the model.alpha(float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).
Returns (list[list]): 2D list with GLMM results, or error message string.
Examples
Example 1: Binomial GLMM with random intercept only
Inputs:
| y | x | groups |
|---|---|---|
| 0 | 1 | 1 |
| 1 | 2 | 1 |
| 0 | 1.5 | 2 |
| 1 | 2.5 | 2 |
| 1 | 3 | 3 |
| 0 | 1.2 | 3 |
Excel formula:
=GLMM_BINOMIAL({0;1;0;1;1;0}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3})
Expected output:
"non-error"
Example 2: Binomial GLMM with random slopes
Inputs:
| y | x | groups | x_random |
|---|---|---|---|
| 0 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 1 | 1.5 | 2 | 1.5 |
| 0 | 2.5 | 2 | 1.5 |
| 1 | 3 | 3 | 2 |
| 1 | 1.2 | 3 | 2 |
Excel formula:
=GLMM_BINOMIAL({0;1;1;0;1;1}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, {1;1;1.5;1.5;2;2})
Expected output:
"non-error"
Example 3: Binomial GLMM without fixed intercept
Inputs:
| y | x | groups | fit_intercept |
|---|---|---|---|
| 0 | 1 | 1 | false |
| 1 | 2 | 1 | |
| 0 | 1.5 | 2 | |
| 1 | 2.5 | 2 | |
| 1 | 3 | 3 | |
| 0 | 1.2 | 3 |
Excel formula:
=GLMM_BINOMIAL({0;1;0;1;1;0}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, FALSE)
Expected output:
"non-error"
Example 4: Binomial GLMM with 90% confidence intervals
Inputs:
| y | x | groups | x_random | fit_intercept | alpha |
|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | true | 0.1 |
| 1 | 2 | 1 | 1 | ||
| 1 | 1.5 | 2 | 1.5 | ||
| 0 | 2.5 | 2 | 1.5 | ||
| 1 | 3 | 3 | 2 | ||
| 1 | 1.2 | 3 | 2 |
Excel formula:
=GLMM_BINOMIAL({0;1;1;0;1;1}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, {1;1;1.5;1.5;2;2}, TRUE, 0.1)
Expected output:
"non-error"
Python Code
import math
import numpy as np
from scipy import stats as sp_stats
from statsmodels.genmod.bayes_mixed_glm import BinomialBayesMixedGLM
def glmm_binomial(y, x, groups, x_random=None, fit_intercept=True, alpha=0.05):
"""
Fits a Generalized Linear Mixed Model (GLMM) with binomial family for binary clustered data.
See: https://www.statsmodels.org/stable/mixed_glm.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Binary dependent variable as a column vector (0 or 1).
x (list[list]): Fixed effects predictors matrix where each column is a predictor.
groups (list[list]): Group or cluster membership as a column vector (integer coded).
x_random (list[list], optional): Random effects predictors matrix. If omitted, uses random intercept only. Default is None.
fit_intercept (bool, optional): If true, includes fixed intercept in the model. Default is True.
alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.
Returns:
list[list]: 2D list with GLMM results, or error message string.
"""
# Set seed for reproducibility (Variational Bayes uses random initialization)
np.random.seed(42)
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_2d_array(arr, name, min_rows=1, min_cols=1):
if not isinstance(arr, list):
return f"Invalid input: {name} must be a 2D list."
if len(arr) < min_rows:
return f"Invalid input: {name} must have at least {min_rows} row(s)."
for i, row in enumerate(arr):
if not isinstance(row, list):
return f"Invalid input: {name} row {i} must be a list."
if i == 0:
expected_cols = len(row)
if expected_cols < min_cols:
return f"Invalid input: {name} must have at least {min_cols} column(s)."
elif len(row) != expected_cols:
return f"Invalid input: {name} must have consistent column count."
for j, val in enumerate(row):
if not isinstance(val, (int, float)):
return f"Invalid input: {name}[{i}][{j}] must be numeric."
if math.isnan(val) or math.isinf(val):
return f"Invalid input: {name}[{i}][{j}] must be finite."
return None
# Normalize and validate inputs
y = to2d(y)
x = to2d(x)
groups = to2d(groups)
err = validate_2d_array(y, "y", min_rows=2, min_cols=1)
if err:
return err
if len(y[0]) != 1:
return "Invalid input: y must be a column vector (single column)."
err = validate_2d_array(x, "x", min_rows=2, min_cols=1)
if err:
return err
err = validate_2d_array(groups, "groups", min_rows=2, min_cols=1)
if err:
return err
if len(groups[0]) != 1:
return "Invalid input: groups must be a column vector (single column)."
if len(y) != len(x) or len(y) != len(groups):
return "Invalid input: y, x, and groups must have the same number of rows."
# Validate y values are binary
y_flat = [row[0] for row in y]
if not all(val in (0, 0.0, 1, 1.0) for val in y_flat):
return "Invalid input: y must contain only 0 or 1 values."
# Validate alpha
if not isinstance(alpha, (int, float)):
return "Invalid input: alpha must be numeric."
if math.isnan(alpha) or math.isinf(alpha):
return "Invalid input: alpha must be finite."
if not (0 < alpha < 1):
return "Invalid input: alpha must be between 0 and 1."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Handle x_random
if x_random is not None:
x_random = to2d(x_random)
err = validate_2d_array(x_random, "x_random", min_rows=2, min_cols=1)
if err:
return err
if len(x_random) != len(y):
return "Invalid input: x_random must have the same number of rows as y."
# Convert to numpy arrays
try:
y_arr = np.array(y_flat)
x_arr = np.array(x)
groups_flat = [int(row[0]) for row in groups]
groups_arr = np.array(groups_flat)
# Add intercept to fixed effects if requested
if fit_intercept:
intercept_col = np.ones((len(x_arr), 1))
x_arr = np.hstack([intercept_col, x_arr])
# Create random effects structure with one-hot encoding for groups
unique_groups = np.unique(groups_arr)
n_groups = len(unique_groups)
exog_vc = np.zeros((len(y_arr), n_groups))
for i, g in enumerate(groups_arr):
group_idx = np.where(unique_groups == g)[0][0]
exog_vc[i, group_idx] = 1
# All groups share same variance parameter
ident = np.zeros(n_groups, dtype=int)
# Fit the model
model = BinomialBayesMixedGLM(
endog=y_arr,
exog=x_arr,
exog_vc=exog_vc,
ident=ident
)
result = model.fit_vb()
# Extract fixed effects (Bayesian posterior means and SDs)
fe_mean = result.fe_mean
fe_sd = result.fe_sd
# Compute z-scores and p-values
z_scores = fe_mean / fe_sd
p_values = 2 * (1 - sp_stats.norm.cdf(np.abs(z_scores)))
# Compute confidence intervals
z_crit = sp_stats.norm.ppf(1 - alpha / 2)
ci_lower = fe_mean - z_crit * fe_sd
ci_upper = fe_mean + z_crit * fe_sd
# Build output
output = []
# Section 1: Fixed effects header (9 columns - this is the max width)
fixed_header = ['fixed_effects', 'parameter', 'coefficient', 'std_error',
'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']
output.append(fixed_header)
# Section 1: Fixed effects rows
param_names = ['intercept'] if fit_intercept else []
param_names += [f'x{i+1}' for i in range(x_arr.shape[1] - (1 if fit_intercept else 0))]
for i, param_name in enumerate(param_names):
odds_ratio = math.exp(float(fe_mean[i]))
row = ['', param_name, float(fe_mean[i]), float(fe_sd[i]), float(z_scores[i]),
float(p_values[i]), float(ci_lower[i]), float(ci_upper[i]), odds_ratio]
output.append(row)
# Section 2: Random effects header (padded to 9 columns)
random_header = ['random_effects', 'parameter', 'variance', 'std_dev', None, None, None, None, None]
output.append(random_header)
# Section 2: Random effects variance components (padded to 9 columns)
# vcp_mean contains variance parameters (likely on log scale)
vcp_mean = result.vcp_mean
for i in range(len(vcp_mean)):
param_name = 'intercept' if (x_random is None or i == 0) else f'random_x{i+1}'
# Variance parameter is typically on log scale, so exp() to get variance
variance = math.exp(float(vcp_mean[i]))
std_dev = math.sqrt(variance)
row = ['', param_name, variance, std_dev, None, None, None, None, None]
output.append(row)
# Section 3: Model statistics (padded to 9 columns)
# Note: Bayesian methods don't have traditional llf/AIC/BIC
output.append(['log_likelihood', 0.0, None, None, None, None, None, None, None])
output.append(['aic', 0.0, None, None, None, None, None, None, None])
output.append(['bic', 0.0, None, None, None, None, None, None, None])
return output
except Exception as e:
return f"Error fitting GLMM: {str(e)}"